Skip to content

feat: add Iceberg v3 type definitions#752

Open
zhjwpku wants to merge 4 commits into
apache:mainfrom
zhjwpku:add-iceberg-v3-types
Open

feat: add Iceberg v3 type definitions#752
zhjwpku wants to merge 4 commits into
apache:mainfrom
zhjwpku:add-iceberg-v3-types

Conversation

@zhjwpku

@zhjwpku zhjwpku commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Introduce the Iceberg v3 types (variant, geometry, geography), including their schema/JSON serialization and type-system integration (visitors, schema projection, etc.).

Reading and writing data of these types is not implemented yet: conversion to/from Arrow, Avro, and Parquet returns an error, as do identity transform binding and scalar validation for them.

@zhjwpku zhjwpku force-pushed the add-iceberg-v3-types branch from 1c1f95c to 00c7860 Compare June 16, 2026 16:54
Introduce the Iceberg v3 types (variant, geometry, geography), including
their schema/JSON serialization and type-system integration (visitors,
schema projection, etc.).

Reading and writing data of these types is not implemented yet: conversion
to/from Arrow, Avro, and Parquet returns an error, as do identity transform
binding and scalar validation for them.
@zhjwpku zhjwpku force-pushed the add-iceberg-v3-types branch from 00c7860 to 2b09f55 Compare June 16, 2026 16:54
Comment thread src/iceberg/json_serde.cc Outdated

@WZhuo WZhuo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/iceberg/type.h Outdated
explicit GeometryType(std::string crs);
~GeometryType() override = default;

[[nodiscard]] std::string_view crs() const;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[nodiscard]] std::string_view crs() const;
std::string_view crs() const;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I removed all remaining [[nodiscard]] annotations in type.h

Comment thread src/iceberg/type.h Outdated
bool Equals(const Type& other) const override;

private:
std::optional<std::string> crs_;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty thing is enough to represent a missing crs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, changed to string.

Comment thread src/iceberg/type.h Outdated
Comment thread src/iceberg/type.h Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Iceberg v3 type-system support by adding the new types variant, geometry, and geography (plus EdgeAlgorithm for geography), and wiring them through the existing visitor/type utilities, schema/JSON parsing & serialization, and compatibility checks. Data read/write support for these types is explicitly not implemented yet (Arrow/Avro/Parquet conversions and identity transform binding return errors).

Changes:

  • Add v3 TypeIds (kVariant, kGeometry, kGeography) and corresponding Type implementations (including CRS/edge algorithm handling and stringification).
  • Integrate v3 types into visitors, schema projection/utilities, transforms, and format-version gating; return NotSupported for unsupported IO/conversions.
  • Extend and adjust unit tests to cover v3 type parsing/printing and “unsupported” behavior in conversions/transforms.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/iceberg/util/visitor_generate.h Extends generated visitor action lists and adds explicit dispatch for variant in “primitive default” switch.
src/iceberg/util/visit_type.h Updates categorical visitor docs to include a fifth category for variant.
src/iceberg/util/type_util.h Adds VariantType overloads to schema/type utility visitors.
src/iceberg/util/type_util.cc Implements VariantType visitor handling and adjusts projection logic to treat non-nested leaf types consistently.
src/iceberg/util/struct_like_set.cc Returns NotSupported for scalar validation of v3 types.
src/iceberg/update/update_schema.cc Adds VisitVariant handling in schema-update visitor.
src/iceberg/type.h Adds VariantType, GeometryType, GeographyType, factories, and edge-algorithm APIs; updates type factory group docs.
src/iceberg/type.cc Implements v3 type behavior, factories, TypeId/EdgeAlgorithm string conversions and parsing.
src/iceberg/type_fwd.h Adds new TypeIds, EdgeAlgorithm, and forward declarations for new types.
src/iceberg/transform.cc Disables identity transform for geometry/geography.
src/iceberg/transform_function.cc Enforces identity-transform input-type restrictions for geometry/geography.
src/iceberg/test/visit_type_test.cc Extends type test cases to include v3 types and updates nested-vs-non-nested expectations.
src/iceberg/test/type_test.cc Extends type test cases, adjusts nested checks, and adds geography default/algorithm equality tests.
src/iceberg/test/transform_test.cc Adds coverage ensuring identity transform rejects v3 types.
src/iceberg/test/schema_test.cc Adds schema projection test coverage for variant fields.
src/iceberg/test/schema_json_test.cc Adds JSON round-trip and invalid-input tests for v3 type strings (case/spacing/algorithms).
src/iceberg/test/rest_json_serde_test.cc Updates expected error message to match new “Cannot parse type string” behavior.
src/iceberg/test/arrow_test.cc Adds test asserting Arrow conversion rejects v3 types.
src/iceberg/table_metadata.h Gates v3 types behind Iceberg format version >= 3.
src/iceberg/schema_internal.cc Refactors Arrow schema conversion to return Status, improves error reporting with type paths, and rejects v3 types explicitly.
src/iceberg/parquet/parquet_writer.cc Adds VisitVariant to metrics collector visitor.
src/iceberg/parquet/parquet_schema_util.cc Rejects reading v3 types from Parquet schema evolution validation.
src/iceberg/parquet/parquet_metrics.cc Adds VisitVariant to metrics visitor.
src/iceberg/metrics_config.cc Treats variant as a non-nested leaf for metrics field-id limiting.
src/iceberg/json_serde.cc Adds JSON serialization and parsing for v3 types (including CRS and edge algorithm); normalizes primitive parsing to be case-insensitive.
src/iceberg/delete_file_index.cc Adjusts equality-delete bound conversion to skip any non-primitive types (avoids mis-casting variant).
src/iceberg/avro/avro_schema_util.cc Rejects writing/reading v3 types to/from Avro with NotSupported.
src/iceberg/avro/avro_schema_util_internal.h Declares Avro visitor overloads for v3 types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/iceberg/type.h
Comment on lines 48 to +52
/// \brief Get the type ID.
[[nodiscard]] virtual TypeId type_id() const = 0;
virtual TypeId type_id() const = 0;

/// \brief Is this a primitive type (may not have child fields)?
[[nodiscard]] virtual bool is_primitive() const = 0;
virtual bool is_primitive() const = 0;
Comment thread src/iceberg/type.h
Comment on lines 54 to +55
/// \brief Is this a nested type (may have child fields)?
[[nodiscard]] virtual bool is_nested() const = 0;
virtual bool is_nested() const = 0;
Comment thread src/iceberg/type.h
Comment on lines 60 to +62
protected:
/// \brief Compare two types for equality.
[[nodiscard]] virtual bool Equals(const Type& other) const = 0;
virtual bool Equals(const Type& other) const = 0;
Comment thread src/iceberg/type.h
Comment on lines 78 to +79
/// \brief Get a view of the child fields.
[[nodiscard]] virtual std::span<const SchemaField> fields() const = 0;
virtual std::span<const SchemaField> fields() const = 0;
Comment thread src/iceberg/type.h
Comment on lines 98 to +99
/// \brief Get a field by name (case-sensitive).
[[nodiscard]] Result<std::optional<SchemaFieldConstRef>> GetFieldByName(
std::string_view name) const;
Result<std::optional<SchemaFieldConstRef>> GetFieldByName(std::string_view name) const;
Comment thread src/iceberg/type.h
Comment on lines 325 to +329
/// \brief Get the precision (the number of decimal digits).
[[nodiscard]] int32_t precision() const;
int32_t precision() const;
/// \brief Get the scale (essentially, the number of decimal digits after
/// the decimal point; precisely, the value is scaled by $$10^{-s}$$.).
[[nodiscard]] int32_t scale() const;
int32_t scale() const;
Comment thread src/iceberg/type.h
Comment on lines 378 to +381
/// \brief Is this type zoned or naive?
[[nodiscard]] virtual bool is_zoned() const = 0;
virtual bool is_zoned() const = 0;
/// \brief The time resolution.
[[nodiscard]] virtual TimeUnit time_unit() const = 0;
virtual TimeUnit time_unit() const = 0;
Comment thread src/iceberg/type.h
Comment on lines 500 to +501
/// \brief The length (the number of bytes to store).
[[nodiscard]] int32_t length() const;
int32_t length() const;
Comment thread src/iceberg/test/type_test.cc Outdated
Comment thread src/iceberg/test/visit_type_test.cc Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants